53 research outputs found
Semi-Supervised Speech Emotion Recognition with Ladder Networks
Speech emotion recognition (SER) systems find applications in various fields
such as healthcare, education, and security and defense. A major drawback of
these systems is their lack of generalization across different conditions. This
problem can be solved by training models on large amounts of labeled data from
the target domain, which is expensive and time-consuming. Another approach is
to increase the generalization of the models. An effective way to achieve this
goal is by regularizing the models through multitask learning (MTL), where
auxiliary tasks are learned along with the primary task. These methods often
require the use of labeled data which is computationally expensive to collect
for emotion recognition (gender, speaker identity, age or other emotional
descriptors). This study proposes the use of ladder networks for emotion
recognition, which utilizes an unsupervised auxiliary task. The primary task is
a regression problem to predict emotional attributes. The auxiliary task is the
reconstruction of intermediate feature representations using a denoising
autoencoder. This auxiliary task does not require labels so it is possible to
train the framework in a semi-supervised fashion with abundant unlabeled data
from the target domain. This study shows that the proposed approach creates a
powerful framework for SER, achieving superior performance than fully
supervised single-task learning (STL) and MTL baselines. The approach is
implemented with several acoustic features, showing that ladder networks
generalize significantly better in cross-corpus settings. Compared to the STL
baselines, the proposed approach achieves relative gains in concordance
correlation coefficient (CCC) between 3.0% and 3.5% for within corpus
evaluations, and between 16.1% and 74.1% for cross corpus evaluations,
highlighting the power of the architecture
Ladder Networks for Emotion Recognition: Using Unsupervised Auxiliary Tasks to Improve Predictions of Emotional Attributes
Recognizing emotions using few attribute dimensions such as arousal, valence
and dominance provides the flexibility to effectively represent complex range
of emotional behaviors. Conventional methods to learn these emotional
descriptors primarily focus on separate models to recognize each of these
attributes. Recent work has shown that learning these attributes together
regularizes the models, leading to better feature representations. This study
explores new forms of regularization by adding unsupervised auxiliary tasks to
reconstruct hidden layer representations. This auxiliary task requires the
denoising of hidden representations at every layer of an auto-encoder. The
framework relies on ladder networks that utilize skip connections between
encoder and decoder layers to learn powerful representations of emotional
dimensions. The results show that ladder networks improve the performance of
the system compared to baselines that individually learn each attribute, and
conventional denoising autoencoders. Furthermore, the unsupervised auxiliary
tasks have promising potential to be used in a semi-supervised setting, where
few labeled sentences are available.Comment: Submitted to Interspeech 201
Inclusive Market Oriented Development (IMOD) at ICRISAT
IMOD is the unifying conceptual framework for ICRISAT’s work for the period 2011-2020. It emerged from the extensive global consultations, analyses and deliberations of the 2010 Strategic Planning process.In a nutshell, IMOD is a development model that frames ICRISAT’s strategy to help the poor to harness markets while managing risks, in order to most effectively reduce poverty, hunger, malnutrition and environmental degradation across the dryland tropics. This brief birds-eye view of IMOD and its origin sets the context for describing its features in more detail, below..
- …